CHAPTER 1 Biostatistics 101 13
A Matter of Life and Death: Working
with Survival Data
Sooner or later, everyone dies, and in biological research, it becomes especially
important to characterize that sooner-or-later part as accurately as possible using
survival analysis techniques. But characterizing survival can get tricky. It’s possi-
ble to say that patients may live an average of 5.3 years after they are diagnosed
with a particular disease. But what is the exact survival experience? Imagine you
do a study with patients who have this disease. You may ask: Do all patients tend
to live around five or six years, or do half the patients die within the first few
months, and the other half survive ten years or more? And what if some patients
live longer than the observational period of your study? How do you include them
in your analysis? And what about participants who stopped returning calls from
your study staff? You do not know if these dropouts went on to live or die. How do
you include their data in your analysis?
The need to study survival with data like these led to the development of survival
analysis techniques. But survival analysis is not only intended to study the
outcome of death. You can use survival analysis to study the time to the first
occurrence of non-death events as well, like remission or recurrence of cancer,
the diagnosis of a particular condition, or the resolution of a particular condition.
Survival analysis techniques are presented in Part 6.
Getting to Know Statistical Distributions
Statistics books always contain tables, so why should this one be any different?
Back in the not-so-good old days, when analysts had to do statistical calculations
by hand, they needed to use tables of the common statistical distributions to com-
plete the calculation of the significance test. They needed tables for the normal
distribution, Student t, chi-square, Fisher F, and others. Now, software does all
this for you, including calculating exact p values, so these printed tables aren’t
necessary anymore.
But you should still be familiar with the common statistical distributions that may
describe the fluctuations in your data, or that may be referenced in the course of
performing a statistical calculation. Chapter 24 contains a list of commonly used
distribution functions, with explanations of where you can expect to encounter
those distributions and what they look like. We also include a description of some
of their properties and how they’re related to other distributions. Some of them
are accompanied by a small table of critical values, corresponding to statistical
significance at α = 0.05.